Environmental structure and competitive scoring advantages in team competitions 
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In most professional sports, the structure of the environment is kept neutral so that scoring imbal- 
ances may be attributed to differences in team skill. It thus remains unknown what impact structural 
heterogeneities can have on scoring dynamics and producing competitive advantages. Applying a 
generative model of scoring dynamics to roughly 10 million team competitions drawn from an on- 
line game, we quantify the relationship between a competition's structure and its scoring dynamics. 
Despite wide structural variations, we find the same three-phase pattern in the tempo of events 
observed in many sports. Tempo and balance are highly predictable from a competition's structural 
features alone and teams exploit environmental heterogeneities for sustained competitive advantage. 
The most balanced competitions are associated with specific environmental heterogeneities, not from 
equally skilled teams. These results shed new light on the principles of balanced competition, and 
illustrate the potential of online game data for investigating social dynamics and competition. 



Professional team sports are a rich and relatively con- 
trolled domain through which to investigate fundamental 
questions in both the dynamics within and across com- 
petitions between groupSjand the factors that determine 
competitive outcomes [i|, Q . With many possible actions 
and many possible payoffs, such games are a kind of dy- 
namical competition [3.], in contrast to the strategic in- 
teractions of classic game theory Q. A distinguishing 
feature of most such competitions is their structurally 
homogeneous or "level" playing field, which allows differ- 
ences in team scores to be attributed to one team being 
relatively more skilled than another, or, if the difference 
is small, to chance events [5|, @]. 

It thus remains unknown what impact structural het- 
erogeneities, like an irregular playing field, variations in 
rules, or differences in resources, may have on a compe- 
tition's internal dynamics. Heterogeneities may produce 
structural competitive advantages [7,], allowing a team 
to perform above its skill level by exploiting these en- 
vironmental irregularities. In fact, the roles in shaping 
competition dynamics and outcomes of skill, structure, 
and chance remain highly controversial, both in sports Q 
and in other types of social competition [3, [3, [l3l ■ A bet- 
ter understanding of these principles would inform the 
design of novel competitive environments [111, [3, and 
could shed light on competition dynamics in other do- 
mains, such as ecology and evolutionary biology [ij], po- 
litical conflict Q and economics |14| . 

Online games present a novel approach to investi- 
gate these questions. Such games encompass a broad 
and growing variety of relatively controlled competitions, 
played by hundreds of millions of individuals [15[ and 
producing large quantities of detailed observational data. 
We study a unique data set drawn from the popular on- 
line game Halo (see SI Appendix), a kind of virtual team 
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combat, which contains nearly 1 billion scoring events 
across roughly 10 million diversely structured team com- 
petitions. Each of these competitions is roughly indepen- 
dent, such that team memberships are substantially ran- 
domized and no acquired resources are carried to the next 
competition. This property thus mitigates the confound- 
ing effects of cross-competition correlations present in 
professional sports and allows us to study how structural 
variations shape competition dynamics and outcomes. 

We partition these competitions according to their par- 
ticular environmental structure, competition rules, re- 
source quality and difference in team skill, and character- 
ize their scoring dynamics via a probabilistic model. The 
resulting model parameters provide a compact represen- 
tation of the associated competitive dynamics, and serve 
as targets to be explained by variation in a competition's 
structural features. 



Despite wide variation, structure has a modest impact 
on the tempo of events, but a large impact on the scoring 
balance, i.e., the difference in team scores. Additionally, 

the rate of scoring events over time exhibits the same 

t — i 

three-phase pattern observed in professional sports [16| . 
Overall, structural features alone are highly predictive 
of overall competition tempo, the range of competitive 
scoring advantages, and ultimate predictability of the 
competition's outcome. Like business flrms competing in 
the marketplace [7| , teams generally exploit environmen- 
tal and resource heterogeneities for sustained competi- 
tive advantage. However, contrary to the pattern of pro- 
fessional sports, the most balanced competitions — those 
with narrow margins of victory — arise from speciflc envi- 
ronmental heterogeneities, not from equally skilled teams 
competing in homogeneous environments. These results 
illustrate the rich potential of online game data for inves- 
tigating social dynamics and competition [17| , clarify the 
role of chance when teams are well matched, and point 
to specific design principles for balanced competitions. 



RESULTS 

Quantifying competition dynamics. We first intro- 
duce the notion of an "ideal" competition, in wtiidi per- 
fectly matched teams play on a level field with no ex- 
ploitable features. Such a competition's outcome is thus 
determined solely by the occurrence and accumulation of 
chance events, e.g., accidents, miscalculations, and events 
outside direct control. In this way, the highly strategic 
and carefully motivated actions of equally skilled teams 
will effectively produce purely stochastic dynamics. 

These dynamics can be described by a particularly 
simple stochastic process HIS]. Scoring events occur in- 
frequently and independently, and their pattern follows 
a Poisson process with rate Aq — a common assumption 
in quantitative analysis and modeling of professional 
sports [la, ll^, [23 ■ Given a scoring event occurs, a 
fair coin determines which team accrues points from it. 
The difference in scores between teams thus follows an 
unbiased random walk, and scoring overall follows an 
equiprobable or balanced Bernoulli scheme. 

Real competitions, with heterogeneous structure or 
skill differences, will deviate from this ideal. We cap- 
ture these deviations through a generalized model, which 
may be fitted directly to scoring data and whose parame- 
ters quantify the size and character of the non-ideal pat- 
terns. We then investigate the extent to which the ob- 
served non-ideal patterns can be predicted from variation 
in competition structure. 

We assume a competition between teams r and 5, and 
we let Sr(t) denote team r's cumulative score at an in- 
termediate time t < T. The probability that r's score 
increases at time t is given by the joint probability of a 
scoring event occurring at t and of r scoring it. Letting 
these probabilities be independent yields 

Pr( Asr (<) > 0) = Pr( Asr > | 6*, event) Pr(event at i | 6* ) 

where 9 parameterizes the non- ideal patterns. 

Scoring events occur infrequently and independently, 
and are now produced by a simple non-stationary point 
process, in which the arrival of events varies linearly with 
time: 

Pr(event at t\ Xo,a) = Xq + at . 

The base or background rate is given by Aq and a pa- 
rameterizes the non-stationarity, e.g., increasing {a > 0) 
or decreasing {a < 0) tempo. When a = 0, we recover 
the ideal case of a Poisson process with rate Aq. 

The score of a team follows a general Bernoulli process. 
Given a scoring event, points are awarded to team r with 
some probability that is fixed for this competition, but 
which may vary between competitions 

Pr(Asr > I event) = c , 

and otherwise, they are awarded to team b. This scor- 
ing bias c is a probabilistic measure of r's competitive 



advantage over b, e.g., from a difference in skill or from 
exploitable features of the competition. When c = 1/2, 
we recover the ideal case of a balanced Bernoulli process, 
while deviations produce the more lopsided trajectories 
associated with non-ideal dynamics. 

Across competitions with the same structure, different 
pairs of teams will exhibit different competitive advan- 
tages. Thus, the natural explanatory target is the dis- 
tribution of the scoring imbalances Pr(c). whose natural 
form is a symmetric Beta distribution [2l| (see SI Ap- 
pendix), the conjugate prior for the Bernoulli process. 
The result is a one-parameter model that quantifies the 
overall variability in competitive advantages across a set 
of competitions. The ideal case of perfectly matched 
teams and scoring differences due only to chance events 
occurs at c = 1/2, which is recovered in the limit of 
/3 ^> cxD. Smaller values of /3 indicate less balanced and 
thus more predictable scoring dynamics across the set. 

We supplement this parametric approach with a non- 
parametric measure of non-ideal behavior: the predica- 
bility of the winner from a partially unfolded competi- 
tion. Having observed the first k scoring events, pre- 
dicting the winning team is a kind of classification task, 
which we formalize as a Markov chain on the sequence 
of team scores (see SI Appendix). For two-team compe- 
titions, the probability that team r wins, given current 
scores Sr and Sb, is 

Pr(r wins | Sr, Sb) = Pr(r wins | s^ + 1, s^) ■ c + 

Pr(r wins | Sr, si, + 1) ■ {1 — c) , 

where c — Sr/{sr + Sb) estimates r's competitive advan- 
tage. After each event, the classifier predicts as the win- 
ner the team with the greatest estimated odds-to-win, 
and its accuracy is measured by the AUG statistic [2a |. 
the probability of choosing the correct winning team. 

The AUG versus k provides complete information 
about a competition's predictability but is not amenable 
to our subsequent analysis. We instead use a point mea- 
sure p, defined as the ratio of the Markov classifier's AUG 
to that of an ideal competition (c = 1/2), when 20% of 
the competition has unfolded. A value of p > 1 indicates 
that the competition outcomes are more predictable than 
in the ideal case. 

Competition data. Our data are drawn from the pop- 
ular online game Halo: Reach, and span nearly 1 billion 
scoring events across roughly 10 million diversely struc- 
tured team competitions. These competitions are divided 
into 125 types according to 35 structural features defin- 
ing the spatial environmental, competition rules, resource 
quality, and whether teams had roughly equal skill (see 
5*/ Appendix). 

Halo competitions are a kind of real-time virtual com- 
bat. Human players guide their avatars through an 
arena containing complex terrain, coordinate actions 
with teammates through visual and audio signals, and 
encounter opponents. A scoring event occurs when one 
avatar eliminates another, and this event increments the 
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FIG. 1. Patterns in tempo and score dynamics. For each of 125 competition types, the probability of a scoring event at time 
t, in the (A) early, (B) middle and (C) end phases of a competition; and (D), the distributions of the probability that team r 
is awarded the point. Ideal (dashed) and the global average (solid) patterns are also shown. 



former's team score. After a short delay, the latter 
is returned to the competition at another arena loca- 
tion. Competitions end either when a fixed time limit is 
reached (typically 10 minutes) or when one team's score 
reaches some threshold (typically 50). 

Only individual player skill persists across competi- 
tions. Temporary resources, whose control may yield 
a competitive advantage, are acquirable within a com- 
petition, e.g., highly defensible positions, high quality 
avatar items, and tactical information. Team member- 
ship is also temporary, being substantially randomized 
across competitions by the online system. These features 
make Halo competitions well suited for investigating the 
impact of structural heterogeneities on competition dy- 
namics. Unlike professional sports, whose team member- 
ships persist across competitions and which exhibit little 
structural variation, each Halo competition is roughly 
independent of the next, which mitigates confounding ef- 
fects in characterizing the importance of structural vari- 
ations. 

From the scoring events within a given type of compe- 
tition, we estimate both model parameters and the out- 
come predictability (see SI Appendix). This produces a 
set of coordinates (Ao,a,/3,/5) and provides a compact 
and interpretable summary of that competition type's 
scoring dynamics and variability. Letting ff denote the 
structural features of a given competition type, explain- 
ing variation across the estimated coordinates from vari- 
ation in ff will reveal the impact of structural features on 
competition dynamics, if any. 

The determinants of balance /3, which quantifies the 
strength and distribution of competitive advantages, are 
of particular interest. Players may prefer more balance 
because it offers a fair chance at winning. Or, they may 
prefer less balance because it offers greater reward for 
the risk. In these competitions, more balance moderately 
correlates with a lower probability that at least one player 



will prematurely leave the field of play 



0.43, see 



SI Appendix), a typically voluntary action. Thus, play- 
ers exhibit a moderate but real preference for more bal- 
anced, i.e., more ideal, competitions, whose outcomes are 
less predictable, whose final score differences are smaller, 
and whose dynamics are effectively more like a simple 
stochastic process. 

Patterns in Tempo and Score Dynamics. We first 
verify that our generative model effectively captures the 
true scoring dynamics of these competitions and whether 
they exhibit patterns similar to those of professional 
sports. 

Across all competition types, we find a consistent 
three-phase non-stationary pattern in the tempo of scor- 
ing events, i.e., the probability of a scoring event as a 
function of time elapsed or time remaining. Specifically, 
we find an early phase of little or uneven activity, a pro- 
tracted middle phase of slow and steadily increasing ac- 
tivity, and an end phase of either slightly decreased or 
markedly increased activity (Fig. lA-C). 

The early- and end-phase patterns are caused by 
boundary effects in the length of competition, and these 
are also observed in professional sports 'l(y\. Early in 
a competition, players require some time to move from 
their initial positions to their first scoring opportunities, 
which suppresses the tempo of events relative to the ideal 
case. Although the shape of this early phase varies mod- 
erately by competition type (Fig. 1^), after 20-30 sec- 
onds these variations largely disappear and the tempo 
transitions into the more stable middle phase. 

Similarly, near a competition's end, the impending 
cessation of scoring opportunities encourages different 
strategic choices [43 than in the early or middle phases. 
Here, we observe either slightly decreased or strongly 
increased tempo (Fig. IC), depending on whether the 
competition type's particular rules provide an incentive 
for risk taking in the final seconds. When the incen- 
tive is present, the tempo increases dramatically just 



before the competition ends, as players take greater 
risks for the win — a pattern also observed in professional 
sports [la, U^. When the incentive is absent, play- 
ers instead adopt defensive positions to deny the oppos- 
ing team additional points, leading to decreased scoring 
rates — a pattern not typically observed in sports. 

In contrast, the middle phase's tempo exhibits a 
roughly linear increase over time (Fig. IB), which agrees 
with our generative model for event timing. To estimate 
our tempo model parameters, we eliminate the bound- 
ary effects by focusing on events in this phase alone (see 
SI Appendix). Across competition types, both the base 
tempo and the acceleration vary widely: base rates can 
vary by up to a factor of two and we observe increases 
in tempo of 5-20% over the phase. Within-competition 
learning is one likely explanation for this increase [23|. 
Through trial and error, teams may learn how and where 
to produce scoring events, which progressively reduces 
the time spent searching for new scoring opportunities. 

To understand the variation in the accumulation of 
points, we examine the distributions of scoring biases 
across competition types. For a particular competition, 
the scoring bias is estimated as the fraction of points held 
by an arbitrarily labeled team r. We find that all com- 
petition types exhibit moderately non-ideal variations in 
scoring biases (Fig. II?), i.e., they are consistently dis- 
persed from the ideal case of c = 1/2. As with the 
competition tempo in the middle phase, the degree of 
dispersion varies substantially across competition types, 
suggesting a significant role for structural variables. 

As a further test of our generative model's quality for 
these competitions, we estimate Ag, a and /3 from the en- 
tire data set, draw many synthetic competitions from the 
fitted model, and consider whether the simulated scoring 
dynamics are similar to those in the empirical data. The 
results indicate that the simulated competitions match 
the observed sequences on multiple scoring and timing 
statistics unrelated to parameter estimation (see SI Ap- 
pendix). This quantitative agreement indicates that our 
model successfully captures the important dynamical fea- 
tures of our competitions. 

How structure shapes dynamics. We now investi- 
gate four specific types of structure and their impact on 
the estimated competition dynamics. These analysis are 
intended to shed light on how specific structures may 
shape dynamics, and will aid the interpretation of our 
systematic analysis below. 



TABLE I. Estimated tempo and scoring parameters for four 
dimensions of competition variation, illustrating a substantial 
impact of structure on dynamics. Values in parentheses give 
the bootstrap uncertainty. 



Team skill differences. When assigning individuals to 
a new competition instance, the online system uses a 
matchmaking algorithm to substantially randomize team 
composition. This algorithm operates in two modes. For 
players who have completed a moderate number of com- 
petitions, it adjusts team memberships so that teams 
have roughly equal total skill. These estimates are de- 
rived from a Bayesian generalization of the popular Elo 
rating system of individual player skill [6]. Otherwise, 



feature 


variation 


balance 

/3 


base tempo acceleration 
Ao (xlO-^) a (xlO-^) 


skill 


equal 
unequal 


45.9(0.35) 
20.9(0.22) 


166(0.1) 
160(0.1) 


7.09(0.09) 
7.18(0.02) 


, neutral 
environment 

irregular 


47.9(1.20) 
23.9(0.67) 


169(0.4) 
147(0.3) 


9.09(0.22) 
7.49(0.21) 


scoring 


standard 
easy 


41.7(0.36) 
30.3(0.71) 


185(0.2) 
158(1.1) 


8.45(0.16) 
9.16(0.64) 


resources 


versatile 
limited 


20.2(0.52) 
41.7(1.04) 


153(0.2) 
166(0.3) 


7.08(0.13) 
8.49(0.21) 


all 


- 


29.5(0.21) 


163(0.1) 


7.13(0.05) 



teams are assembled without regard to player skill. We 
examine the differences in our model parameters for all 
competitions constructed under each of the two modes. 

Differences in skill have a substantial impact on 
competition balance, as we might expect. However, 
they have little impact on competition tempo (Table U 
Fig. S4yl). When teams have roughly equal skill, scoring 
is more balanced than when the equal-skill control 
is absent (^ = 45.9 ± 0.35 versus 20.9 ± 0.22). This 
difference implies that well-matched teams produce 
substantially more ideal competitions, have smaller 
competitive advantages, and exhibit overall dynamics 
that are closer to those produced by a fair coin. In 
effect, reducing the difference in team skill serves to 
amplify the importance of chance events, i.e., accidents 
and miscalculations. 

Physical environment. The arenas for these competi- 
tions are typically complex virtual terrains, and may con- 
tain large outdoor spaces, complicated indoor corridor 
systems, buildings with multiple levels, defensible posi- 
tions, high ground, etc. We compare model parameters 
for all competitions taking place within two structurally 
distinct environments: one is largely neutral, exhibiting 
strong spatial symmetries and few features like defensi- 
ble locations that might offer tactical advantage, while 
the other is strongly irregular, with an asymmetric and 
strongly vertical spatial structure, truncated sight lines, 
and at least one defensible location. 

Overall, the more symmetric environment produces 
substantially more balanced outcomes and higher scoring 
rates than the irregular one. In fact, the observed dif- 
ference in balance parameters is roughly as large as the 
difference induced by the equal-skill criterion (Table U 
Fig. SAB). This suggests that increasing the homogene- 
ity of the competitive environment, e.g., introducing 
symmetries, removing defensible positions, etc., serves 
to limit environmental opportunities for competitive 
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FIG. 2. Equally spaced quantiles of joint distributions across 
125 competition types of (A) base scoring rate Ac and acceler- 
ation a, and (B) outcome balance /3 and predictability ratio 
p. For event timing parameters, we observe little statistical 
correlation, while greater balance is strongly correlated with 
lower outcome predictability. 



advantage. Much like eliminating differences in skill, 
simpler environments effectively amplify the importance 
of chance events, making competition scoring more ideal. 

Scoring difficulty. Few studies have examined the differ- 
ence in competition dynamics caused by variations in the 
rules of the competition. Our data include several vari- 
ations of this kind, and we examine one particular vari- 
ant to shed light on how small changes in rules may im- 
pact competition dynamics. A popular group of compe- 
tition types alters the standard scoring rules by reducing 
the threshold required to eliminate an opposing avatar 
and by slightly limiting each player's visual field. These 
changes make scoring opportunities easier to exploit, and 
we compare the estimated model parameters for all com- 
petitions of the standard and easy scoring types. 

Lowering the threshold for scoring has a substantial 
impact on competition dynamics (Table U Fig. S4C), 
with easier scoring rules producing less balanced out- 
comes. The size of this difference is nearly half as large 
as the impact of the equal-skill criterion. Additionally, 
the lower threshold decreases the base scoring rate by 



15% but increases the acceleration by roughly 8% over 
those of standard competitions. The implication is that 
lowering the barrier to scoring skews the playing field, 
allowing skilled players to exploit either their skill-based 
competitive advantage or other structurally-derived 
advantages. 

Resource quality. Each competition has a fixed a set of 
acquirable resources, which players use to score points. 
Each resource belongs to one of two classes, which we 
label "versatile" and "limited." Versatile resources are 
generally of higher quality and are more effective for scor- 
ing points. When resources of both classes are present in 
a competition, 80% of scoring events are associated with 
the versatile class, illustrating a strong player preference 
for more effective tools. To clearly separate their effects, 
we examine competitions with either only versatile- or 
only limited-class resources. 

Limited-class competitions produced moderately 
higher base and acceleration rates than versatile-class 
competitions, indicating an overall faster tempo. Fur- 
thermore, competitions with only limited-class resources 
produce substantially more balanced scoring outcomes 
(/3 = 41.7 ± 1.04 versus 20.2 ± 0.52; Table H Fig. 
SAD), a difference as large as that of the equal-skill 
criterion. Just as environmental structures can be 
exploited for competitive advantages, differences in the 
quality of acquirable resources also represent exploitable 
structural heterogeneities, and limiting such variations 
can effectively level a playing field to produce more ideal 
dynamics. 

Structural determinants of competitive dynamics. 

Each competition type defines a point on a (Aq, a, l3, p)- 
manifold, and the distribution of these points describes 
the observed variability in competition dynamics. We 
now consider the degree to which a competition's position 
in this coordinate space is predictable from its structural 
features alone. 

The joint distribution of the model timing parameters 
Ao and a is broadly distributed and shows little inter- 
nal structure (Fig. 2 A). The typical scoring base rate is 
roughly one event per 7.5 seconds, with variations of 2.5s 
in either direction. Additionally, nearly all competitions 
types show modest acceleration rates, with an increase 
of 10-12% over the middle-phase of competition being 
common. The estimated balance parameters /3 are also 
broadly distributed, indicating a wide range of compet- 
itive advantages. The typical competition type has /3 
between 20 and 30, but some have values as large as 50 
or as small as 10 (Fig. 25). We also observe a strong 
negative correlation between scoring balance j3 and the 
predictability p of a competition's winner, although with 
some variation, particularly in the low-/3 regime. 

Predicting dynamics from structure. The extent to 
which a competition's dynamical variables (Ao,a,/iJ,p) 
are predictable from its structural variables fj provides 



TABLE II. Ordered multivariate regression coefficients, with 
uncertainty, for predicting /?, Aq and a of standard-type com- 
petitions from structural features alone, and the correspond- 
ing fraction of variance explained r^ . Here, we show only the 
statistically significant features (p ^ 0.001, t-test); Table S6 
provides the full results. 





structural feature 


e 


std. error 


2 

r 




E5 


indoor terrain 


0.082 


0.008 






Ell 


large arena 


0.059 


0.003 






El 


open terrain 


0.045 


0.009 






E3 


circular terrain 


0.029 


0.006 




Ao 


E9 


outdoor terrain 


0.023 


0.001 


0.96 




SI 


equally skilled teams 


0.005 


0.001 






Rl 


short & medium rang 


3 -0.021 


0.008 






R4 


short & long range 


-0.030 


0.008 






R15 


high-quality resources 


-0.032 


0.006 






E2 


vertical environment 


-0.081 


0.006 






E7 


high ground 


-0.081 


0.005 




a 


R12 

SI 


long range 

equally skilled teams 


1.9x10"^ 
2.9x10"® 


8.1x10"® 
1.7x10"® 


0.65 




E5 


indoor terrain 


1.849 


0.320 






El 


open terrain 


1.391 


0.371 




log/3 


Ell 


large arena 


1.123 


0.141 


0.93 


SI 


equally skilled teams 


0.822 


0.034 




E9 


outdoor terrain 


0.481 


0.076 






E6 


defensible positions 


-0.813 


0.150 






E2 


vertical environment 


-1.645 


0.336 






E7 


high ground 


-2.126 


0.224 






E7 


high ground 


0.138 


0.022 






E2 


vertical environment 


0.123 


0.024 




P 


E6 


defensible positions 


0.061 


0.014 


0.89 




E9 


outdoor terrain 


-0.036 


0.007 






SI 


equally skilled teams 


-0.055 


0.003 






Ell 


large arena 


-0.089 


0.013 





a direct measure of how competition structure shapes 
dynamics. Thirty-five structural features, divided into 
resources (R), environment (E), team skill (S), and rules 
(P) categories, were used to identify 125 distinct types of 
competition. Regressing these structural features onto 
the estimated model parameters quantifies the overall 
predictability of dynamics from structure. The relative 
importance of these features provides additional insight. 

Overall, competition dynamics are highly predictable 
from structure alone (Table |ll| , with structural variables 
explaining 65-96% of the variance in individual dynam- 
ical parameters. Because the coverage across our fea- 
ture space is sparse, we performed three additional tests 
to determine the robustness of our results. Both mul- 
tiple and stepwise regressions produce models of nearly 
equal quality and assign features nearly the same rela- 



tive importances. Randomizing the association of struc- 
tural and dynamical variables yields non-significant cor- 
relations (see SI Appendix)^ indicating our results are 
reliable. 

Competition structure has the largest impact on base 
rate Ao (r^ = 0.96), and features describing neutral or ho- 
mogeneous environments play the dominant role in set- 
ting its value. The base scoring rate is effectively de- 
termined by the "encounter rate" between scoring op- 
portunities and competitors. In these competitions, an 
encounter requires two individuals to locate and engage 
each other; thus, small, neutral environments generate 
these encounters more often than large, irregular ones. 
Competitions between equally-skilled teams exhibit a 
higher encounter rate, but only marginally, as the skill 
coefficient is four time smaller in absolute value than any 
other statistically significant feature. 

The change in scoring rate a is moderately well pre- 
dicted by structure (r^ = 0.65), and competitions with 
resources that operate across long ranges and with well- 
matched teams exhibit less acceleration over the middle 
phase. These resources make it easier to locate and ex- 
ploit the next scoring opportunity, thus mitigating the 
difficulty of searching for new opportunities within large 
or irregular environments. Similarly, skilled competitors 
tend to have prior experience with the location of re- 
sources and strategic environmental structures, improv- 
ing their search efficiency and lowering a. 

The scoring balance /3, which measures the strength 
of the associated competitive advantages, is highly pre- 
dictable from structure (r-^ = 0.93, regression on log/3), 
as is the relative predictability p of the winning team 
(r^ = 0.89). Having well-matched teams, however, is 
only moderately important for increasing balance, and 
well balanced scoring is typically derived from large, 
neutral environments, a situation similar to professional 
team sports with their level playing fields. However, the 
single feature that produces the most balanced compe- 
titions, by a factor of two, is indoor terrain, i.e., rooms 
and corridors. This particular form spatial heterogeneity 
may effectively handicap all competitors by limiting their 
spatial awareness, thus mitigating other competitive ad- 
vantages, including those derived from greater skill or 
more versatile resources, thereby making scoring oppor- 
tunities and outcomes less predictable and more ideal. 

In contrast, the most imbalanced and predictable com- 
petitions are those with controllable or strategically valu- 
able environmental features like high ground or defensi- 
ble positions. For setting the values of /3 and p, such 
features are at least as important, but opposite in sign, 
to having teams of equal skill. These strategically impor- 
tant environmental features can thus effectively upset the 
competitive balance produced by well-matched teams by 
providing one team with a sustained competitive advan- 
tage throughout the competition. 

Surprisingly, variation in rules, including reduced spa- 
tial awareness, weakened defensive capabilities, or a lower 
threshold for scoring, were not statistically significant 



predictors. None of these features produced a measurable 
impact on the tempo or balance of scoring within compe- 
titions, once the effects of other features were taken into 
account. 



DISCUSSION 

Although professional S20rts are often considered mod- 
els of team competition [ig, [T^ [20, |2J, ^M i their limited 
structural variation provides few opportunities for under- 
standing how competition structure can shape competi- 
tion dynamics. Our results shed new light on these and 
other fundamental questions about human social dynam- 
ics and competition. 

In particular, heterogeneities in the spatial environ- 
ment, available resources, competition rules, and team 
skill exert a strong influence on the balance and tempo 
of scoring within a competition. For the virtual team- 
combat simulation studied here, spatial structure plays 
the most important role in producing competitive ad- 
vantages, with skill and resource differences assuming 
supporting roles. It is thus not a superficial analogy to 
say that like business firms leveraging heterogeneous and 
scarce resources for sustained competitive advantage in 
a marketplace [3], teams in Halo leverage environmental 
and resource heterogeneities, like high ground and defen- 
sible positions, toward the same ends. 

But unlike the pattern of either business firms or pro- 
fessional sports teams, some heterogeneities — in the case 
of Halo, significant indoor terrain — can effectively neu- 
tralize competitive advantages normally derived from 
exploitable structural features. When these "leveling" 
features are present, scoring outcomes are substantially 
more balanced than when they are absent, and this lev- 
eling effect is stronger than the one produced by hav- 
ing equally skilled teams. Although the precise mecha- 
nisms of these leveling effects remain unknown, their ex- 
istence implies that competitive advantages are derived 
from specific mechanisms whose effects can be neutral- 
ized by other mechanisms. A better understanding of 
these mechanisms could be derived from controlled ex- 
periments with level design, and may facilitate the design 
of inhomogeneous competitive environments that never- 
theless exhibit the balanced dynamics that homogeneous 
environments produce. 

Otherwise, the most ideal competitions do indeed oc- 
cur in large neutral spaces between well-matched teams. 
It is thus no accident that professional team sports are of- 
ten played in precisely this type of environment: absent 
spatial or resource heterogeneity, competition between 
skilled teams is significantly more ideal. Counterintu- 



itively, the more ideal a competition, the more effectively 
it may be described as a purely random process, not de- 
spite but in fact because of the significant strategic and 
tactical effort behind individual events. That is, the more 
ideal a competition, the greater the role of chance events 
like miscalculations and accidents in determining the out- 
come. We note, however, that replacing the underlying 
competition mechanics by actual coin flipping seems un- 
likely to produce the same level or type of engagement 
among players and spectators. 

The three-phase pattern in the tempo of events in Halo 
competitions is strikingly similar to the pattern observed 
in professional team sports [l6| . Yet the underlying struc- 
tures of most professional sports and a team combat sim- 
ulation could hardly be more different. In the former, 
goals have flxed locations, the environment and within- 
competition resources are homogeneous, and teams are 
highly trained and persistent. In the latter, goals are 
highly mobile, the environment and within-competition 
resources are heterogeneous, and teams are largely non- 
persistent. The existence of a common dynamical pattern 
despite such differences suggests that it may be a univer- 
sal feature of team competitions. The elucidation of its 
origin is an important open question. 

Finally, we omitted explicit roles for within-team vari- 
ables like team composition [26|, coordination [27], and 
player characteristics. Their impact is implicit within 
the estimated model parameters, whose variation is well 
explained by structural variables alone. This particu- 
lar result is likely supported by the substantial random- 
ization in team membership across Halo competitions, 
which serves to mitigate any significant differences in 
team composition. Player and team characteristics likely 
play a more significant role in determining the dynam- 
ics in competitions with persistent teams or homoge- 
neous environments, as in professional sports. A broad 
study of within-competition dynamics across fundamen- 
tally different types of competition may shed complemen- 
tary light on the origin of competitive advantages, the 
mechanisms by which specific features promote or dis- 
courage balanced outcomes, and the fundamental laws of 
competitive dynamics, if any. 
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I. DETAILED DESCRIPTION OF DATA 

Halo: Reach is a popular online game played by nearly 
20 million individuals, and was the 3rd most popular US 
video game of 2010 [1|. It was publicly released by Bungie 
Inc., a former subdivision of Microsoft Game Studios, on 
14 September 2010, and since then, players have gener- 
ated more than 1 billion competitions. Reach is an ex- 
ample of the kind of virtual combat simulation known as 
a "first-person shooter" or FPS. Within the Reach sys- 
tem, players choose from among roughly seven primary 
game types and numerous subtypes, which are played on 
more than 33 terrain maps with 74 weapons (the precise 
number of maps and weapons has varied over time, as 
the publisher has periodically revised the online content 
through downloadable updates). 

Instances of the game can be played alone, with or 
against other players via the Xbox Live online system. 
Participation in this system requires an account, which is 
distinguished by unique and publicly known "gamertag" 
or online pseudonym, chosen by the player. In the Reach 
system, both individual game and player summaries were 
made publicly available through the Halo Reach Stats 
API. Through this digital interface, we collected de- 
tailed data on the first 53 million competition instances 
(roughly 1TB of data). 

Within our sample, there are three basic game types: 
campaign games, a sequence of story-driven, player- 
versus-environment (PvE) maps that many players com- 
plete first; firefight games (also PvE), in which a team 
of human-controlled players battle successive waves of 
computer-controlled enemies; and com,petitive games, a 
player-versus-player (PvP) game type, in which teams of 
the equal size (2, 4, 6 or 8 players) compete to either 
be the first to reach some fixed number of points or have 
the largest score after a fixed length of time. (The precise 
number of players per team, number of points required to 
win and length of a game depends on the game subtype.) 
Here, we focus on the most common type of competitive 
game, with teams of 4 players, a time limit of 600 seconds 
and a score limit of 50 points. 

Among other information, each competition instance 
game file includes the sequence of scoring events at the 
per-second resolution and a list of players by team. Scor- 
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ing events are annotated with the gamertag of the player 
generating the event, the number of points scored and 
the player giving up the points (if applicable) . 

Unlike professional sports, team composition and 
player resources in Reach competitions are not persis- 
tent across instances. The only attribute that persists 
is individual player skill, and thus each new instance is 
a kind of a "blank slate." To join a new instance, in- 
dividual players or small groups (often friends [^) first 
enter a general pool of available competitors. A Bayesian 
"matchmaking" algorithm, which seeks to build teams of 
equal skill [33, then fills teams in the new instance by 
drawing from this pool. This process substantially ran- 
domizes the pairing of individuals within teams and the 
pairing of teams across instances. Because of the match- 
making algorithm and the large size of the pools, a pair of 
non-friend players are highly unlikely to be paired again 
in a new instance; friends may elect to be matched as a 
unit by forming a "party," a special grouping that the 
matchmaking algorithm recognizes. 

The non-persistence and the randomization are fea- 
tures absent from most studies of team performance or 
competition [1-ly], and serve to mitigate the confound- 
ing effects of persistent teams and resources present in 
most competitive systems, e.g., professional sports. For 
our purposes, these features make Reach competitions 
a unique source of data for studying behavioral dynam- 
ics within competitions and how structural factors shape 
this behavior. 

In competitive games, players move their avatars 
through the game map simultaneously, in real-time, nav- 
igating complex terrain, acquiring avatar modifications 
and encountering opponents. Teammates may interact 
through a private voice channel, or through visual sig- 
nals. Points are scored by dealing sufficient damage to 
eliminate an opposing avatar and for each such success, 
a team gains a single point. Eliminated players must 
then wait several seconds before their avatar is placed 
back into the game at one of several specified "spawn" 
locations, equipped with "default" avatar resources that 
depend on the competition type being played. 

For our analysis, we exclude all PvE games and all 
PvP games containing corrupt scoring event data. (Our 
analysis suggests no specific pattern to the corruption.) 
In our primary analyses, we further restricted our sample 
to PvP competitions (i) between two teams of 4 players 
and (ii) where no player exited the game early. This 
latter criterion was relaxed to calculate the relationship 
between dropouts and /3 (see Section |VIII|) . 



II. GENERATIVE MODEL FOR SCORING 
EVENT TIMING AND BALANCE 

The timing and balance (which team receives the 
point) of scoring events within a competition are modeled 
by a conditionally independent Markov process, where an 
incremental change to a team's score Sr is given by 

Pr(Asr(t) > 0) = Pr(Asr > | 6*, event) Pr(event at i | 6*) 

where 6 parameterizes the impact of non-ideal competi- 
tive features. That is, the probability that team r's score 
increases at some time t is the probability that a scor- 
ing event occurred at time t and that the resulting point 
was awarded to r. Furthermore, team labels r and b are 
arbitrary, and we choose r as our reference team below. 
The generation of scoring events is given by a non- 
stationary Poisson process, in which the probability that 
a scoring event occurs at time t varies linearly with time: 



Pr(event at i | Aq, a ) = Xq + at 



(1) 



where Aq is the event background rate and a is the accel- 
eration. When a = 0, we recover the stationary Poisson 
process expected for ideal competitions. 

In a real competition, we observe n < T scoring events, 
for a competition lasting T units of time. Let {ti} denote 
the observed times of these events, and {uj} the times at 
which no event was observed. The model parameters Aq 
and a are then jointly estimated by directly maximizing 
the generative model's log-likelihood function: 

n T-n 

InC = ^ln(Ao + a<,:) + ^ ln(l - Aq - aUj) . (2) 

To limit the biasing effect of the highly non-stationary 
behavior found in the early- and end-phases of competi- 
tions (see main text), we restrict our estimation to events 
occurring in the middle phase, specifically 50 < i < 300. 
This heuristic provides robust conclusions: the estimated 
timing parameters are very close to those found using 
smaller middle-phase windows, and the global average 
trend within this window is roughly linear (Fig. [STJ\). 

For two teams r and 5, the outcome of a scoring event 
(which team receives the point) is given by a biased 
Bernoulli process, in which the probability that an event 
increases the score of team i is 



Pr(si increases | 9 ] 



c I = r 

1 — c i = h , 



where c G [0,1] represents the competitive advantage 
(outcome bias) of the r team. In our model system, 
99.99% of scoring events yield a single point. Although 
we do not consider the possibility here, in general, the 
number of points produced by an event could be drawn 
from some distribution. Thus, the probability that the 
competition ends with final scores Sr and S\y is 



parameter 


estimate, global 


/3 balance 


29.50 ± 0.21 


Ao base rate 


0.1620 ± 0.0001 


a acceleration 


7.00 X 10"^ ± 0.05 X lO"'^ 



TABLE SI. Estimated global scoring tempo and balance pa- 
rameters, with bootstrap uncertainty estimate. 



where c denotes the competitive advantage (scoring bias) 
of team r over team h. 

Because team composition varies across competition 
instances, the competitive advantage of r is modeled as 
a random variable, drawn from some distribution Pr(c). 
The natural choice of the form of this distribution is a 
symmetric Beta distribution with parameter /3, the con- 
jugate prior for the Bernoulli scheme. (We note that 
the prior distribution must be symmetric about c = 1/2 
because team labels are arbitrary.) This distributional 
assumption agrees well with the global empirical distri- 
bution of biases c (Fig. [STK inset). 

The posterior probability of observing final scores 
{5^,56}^; in a competition instance fc is given by their 
Bernoulli likelihood, weighted by the probability of c 
(Eq. (j3])). Given N such instances, the total posterior 
probability of the observed final scores is 

PT{P\{Sr,Sb})^ J ff[Pr({5,,5,}fe|c)Pr(c|/3)J dc 



N 

n 

fe=i 

N 

n 

k=l 



1 ^Sr,.+a-l 






dc 



BiSr,+P,Sb,+P) 

B(/3,/3) 



(4) 



where B(a, b) is the Beta function. 

We estimate the competition balance parameter by nu- 
merically maximizing the logarithm of Eq. Q with re- 
spect to /3, 



N 



\nC = J2 H^iSr, + /?, Sb, + /?)] - ln[B(/3, /3)] . (5) 



Pr(5„5fc|c)=c'5'-(l-c)' 



(3) 



fe=i 



The resulting maximum likelihood estimate /3 provides a 
direct measurement of the overall balance within a set of 
competition instances: when /3 -^ cx), we recover the fair 
coin c = 1/2 expected for ideal competitions. 

For a set of competition instances, numerically max- 
imizing Eq. ([2]) with respect to Ao and a, and Eq. ^ 
with respect to /3, produces maximum likelihood param- 
eter estimates Ao, a, and /3. Uncertainty in these esti- 
mates is then calculated as the standard deviation of the 
bootstrap distribution [7|, where we resample compelte 
competition instances with replacement. Table [ST] gives 
the global parameters estimates and uncertainties, when 
applied to the full set of Halo: Reach competitions. 
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FIG. SI. (A) Global empirical and predicted scoring rates for competitions in Halo: Reach, over the window [50,300] seconds. 
(A, inset) Global empirical and predicted distribution of competitive advantages (smoothed via a Gaussian kernel). (B) For all 
competitions, winner predictability (AUG) as a function of team r's points remaining, for three classifiers (see text). 



III. PREDICTING COMPETITION OUTCOMES 

For a set of competitions, the predictability of an in- 
stance's ultimate winner, after observing only part of the 
game, provides a second, non-parametric measure non- 
ideal dynamics. We model scoring as a Markov chain 
that terminates when a team reaches a score of 50. (In 
our data, 99% of competitive instances terminate accord- 
ing to this criteria; the remainder from the time limit.) 

Suppose an instance has evolved so that teams r and 
b currently hold scores Sr and Sb- The probability that 
team r wins the competition is then 

Pr(r wins | s^, s;,) = Pr(r wins | s^ + 1, Sb) • c + 

Pr(r wins | s^, Sf, + 1) • (1 — c) , (6) 

where c = Sr / {sr + st) is the current maximum likelihood 
estimate of r's scoring bias within this instance, and the 
two probability terms capture the probability that r wins 
if r (or b) wins the next point. (Because a team's score 
is cumulative, each state in the Markov chain has only 
two transitions.) Eq. ([6|) is then solved recursively by 
computing c for the current state and working backwards 
to the instances's current state from the winning states 
where Sr — 50 and Sb < 50. 

We convert this Markov chain into a classifier by pre- 
dicting that team r wins if Pr(r wins | Sr, Sb) > 0.5. The 
probability of correctly choosing the winning team in this 
case is equivalent to computing the AUG statistic over a 
set of instances. (AUG is defined as the area under the 
receiver-operating characteristic (ROG) curve Sj, and is 
mathematically equivalent to the Mann- Whitney U test 
for distinguishing two classes of items.) 

Measuring the AUG as a function of the points remain- 
ing provides full information about the way the compe- 
tition's predictability evolves over time. We convert this 



information into a point measure by computing, with 40 
points remaining for r, the AUG for the Markov classi- 
fier, which we then divided by the corresponding AUG 
for an "ideal" classifier (with fixed c = 1/2). This pro- 
vides a direct measure of how much more predictable a 
real competition's outcome is relative to the ideal model 
described in the main text. 

Using the full data set. Figure [5TB shows the full AUG- 
over-time curves, for the Markov classifier, the ideal clas- 
sifier (c = 1/2), and for a trivial classifier in which at each 
moment we predict as the winner the team currently in 
the lead. Our Markov classifier outperforms the trivial 
classifier because it captures information about the size 
of the lead, i.e., it includes information about the bias 
c in the Bernoulli scoring process, and outperforms the 
ideal classifier because the competitions' dynamics are 
non-ideal. 



IV. TEST OF THE MARKOV ASSUMPTION 

We now test the accuracy of our Markov assumption in 
modeling the scoring dynamics of these competitions. If 
the arrival times of scoring events roughly follow a mem- 
oryless Poisson process, there will be little correlation 
between the sizes of subsequent delays. The correlation 
function C{n) provides a direct measure of the accuracy 
of the Markov assumption, and is calculated as 



C{n) 



{Tf) (T,)2 



(7) 



where Ti is the inter-event delay after event z, n is a shift 
size relative to i, and (.) indicates an average over i. A 
memoryless process matching the Markov assumption in 
our Bernoulli process will produce C{n) sa for n > 0; 
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FIG. S2. Average normalized inter-arrival time between scor- 
ing events, computed in 30 second intervals, for cohorts of 
competitions lasting a specific amount of time, (inset) Auto- 
correlation function C{n) for inter-event times. 



deviations indicate correlations (or anti-correlations) at 
the corresponding time scale. 

First, a simple rescaling of the observed inter-event de- 
lays over the course of competitions of different lengths 
produces a data collapse (Fig. S2), illustrating relatively 
little memory in the system. Second, C{n) for our entire 
sample of competitions (Fig. S2, inset) shows little cor- 
relation (memory) at any time scale. Thus, the Markov 
assumption seems largely justified. 



do < 



to team r; otherwise, it is awarded to b. 

Algorithm SI: Competition simulation() 
t^25 

Sr -s— Sh <— 

c <r- chooscScoringBias() 

while t < 600 and Sr < 50 and Sf, < 50 

'T <— interEventDelayO 
if f + T < 600 

As -i— numPointsO 
then < updateScores (sa,s&, A s,c) 

V 

, else break 

The goodness-of-fit of the model is measured by com- 
paring the simulated and empirical distributions of (i) 
the final score S*, (ii) the final lead size L (at termina- 
tion), (iii) the number of leader changes m, and (iv) the 
amount of time t the leading team stays in the lead given 
a lead of size L. Notably, each of these four quantities 
is distinct (although related) to the aspects of the data 
used to estimate the parametric model's structure, and 
thus they make reasonable checks on the accuracy of the 
model. Figures ISSl'V-D show the results of these tests, 
using 1 million simulated competitions, illustrating very 
good agreement on all dimensions between simulation 
and data. Thus, the basic structure of our generative 
model seems largely justified. 



V. MODEL GOODNESS-OF-FIT 

We now test the plausibility of our generative model, 
i.e., how well it matches the underlying data, by com- 
paring simulated competitions against the empirical data 
along specific statistical measures. This simulation is 
parametric and uses the estimated parameters from our 
generative model to define the corresponding probability 
distributions in the simulator. A close match between 
the synthetic scoring dynamics and the empirical data 
along multiple statistical measure is evidence that our 
generative model accurately captures the basic features 
of these competitions. 

The simulation framework is given in Algorithm [ST] 
The competition clock is started at i = 25 seconds to 
account for the early-phase delay in the onset of scor- 
ing. The bias in the Bernoulli process is then chosen by 
drawing a value iid from the estimated Beta distribution 
with parameter /?. While neither of the termination crite- 
ria have been reached, delays between scoring events are 
drawn from the estimated linear non-stationary process 
with parameters Aq and a. Finally, given that a scoring 
event occurs, with probability c, a single point is awarded 



VI. ADDITIONAL RESULTS FOR HOW 
STRUCTURE SHAPES DYNAMICS 

In the main text, we examined four pairs of compe- 
tition types that each differed on one structural fea- 
ture: team skill, environmental structure, policies, and 
resource quality. Figures [S4K-D show the estimated dis- 
tributions of Pr(c) (parameterized by /3) for these four 
pairs. For each group of instances, the model parame- 
ter /3 was estimated following Section HIl from the scoring 
events on the interval t G [30, 300] seconds of the compe- 
tition. These times were chosen to exclude biases due to 
early- and end-phase boundary effects. 

Figures IS4E -H show the AUG as a function of points 
remaining for same competitions, estimated following 
Section IIIII In each figure, we show for comparison the 
AUG curve for an ideal competition (c = 1/2). The large 
gap between the Markov classifier's AUG curve and the 
ideal curve demonstrates that these competitions are sub- 
stantially more predictable than ideal competitions. This 
gap is largest early in the competition, where scores are 
still relatively far from the scoring limit. We also observe 
modest gaps between the AUG curves for members of 
each pair, illustrating that structural features do impact 
the predictability of competition outcomes. 
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FIG. S3. Comparison of empirical (dashed blue) and simulated (parametric model, red) data for the (A) distribution of final 
total scores S = Sr + Sb, (B) distribution of the number of times the identity of the leading team changes m, (C) distribution 
of final lead sizes L = l^r — Sb\, and (D) time t elapsed as leader given a lead size of L. The close agreement between data and 
simulation suggests that our generative model efficiently captures these competitions' dynamics. 



VII. ADDITIONAL DETAILS OF 
MULTIVARIATE REGRESSION ANALYSIS 

Here we describe additional details of our investigation 
of how resources, policy, environment, and skill features 
explain the variance in the values /3, Aq, a, and p ob- 
served in our data. To quantify the structure of a com- 
petition type ff, we defined 35 structural features that 
characterize the different combinations of environment, 
resources, policies, and teams. Table [S4l gives the full list 
of features, with descriptions, classified into four types: 
resources (R), environment (E), policies (P), and skill 
(S). Applied to our data yields 125 unique competition 
types (see Table SIO). 

For all competition instances with a particular set of 
features, we estimated the coordinates (/3, Ao,a, p) fol- 
lowing Sections |lT] and IIIIl Regression models were built 
on each coordinate independently, and robustness checks 
were conducted to verify these results (see below). Ta- 
ble [S5] lists the statistically significant {p < 0.1) features 
and corresponding coefficients for all four of our models. 

For competition balance /3, we first used a linear model 
/3 = 0^x, with a design matrix x composed of the pre- 
viously defined 125 observations containing 35 features. 
Fitting this model via least squares produced r^ = 0.716 
{p ^ 0.001, F-test), but with strongly skewed residuals. 
We then fitted the model log /3 = d'^x to the data, which 
produced r^ — 0.933 {p <^ 0.001, F-test), a marked im- 
provement, and more symmetric residuals. Examining 
the coefficients, we find that evenly matched teams using 
medium-to-long-range weapons, competing on large en- 
vironments without strategic or defensible positions pro- 
duce more balanced scoring outcomes (larger /?). 

For the base scoring rate Ao, a simple linear model 
yields r^ = 0.955 {p -C 0.001, F-test), indicating that 
structural features explain almost all the observed vari- 
ance. The estimated coefficients show that environmental 



structure features play a dominant role in setting Aq. In 
particular, environments that are small, open, and circu- 
lar correlate best with base scoring rate. In addition to 
the environment's spatial organization, evenly matched 
teams also correlate with higher scoring rates. Teams 
with more experience are likely to be familiar with all 
terrain options and methods for its exploitation. En- 
vironments that are small do not require competitors 
to spend much time seeking out scoring opportunities 
(other avatars). Lastly, environments that are open do 
not provide places to avoid encounters, thus increasing 
the tempo of competition. 

For the acceleration a in the competition tempo, a 
linear model produces an r^ = 0.652 {p ^ 0.001, F- 
test). We find that few of our features correlate with a, 
with the exception of long-range weapons and equally- 
skilled teams, which correlate with smaller a (more ideal 
competitions). This suggests that in competitions where 
players are experienced, there is less to learn and thus a 
is low. This agrees well with the results from Aq, where 
more experience leads to a higher base scoring rate. 

For the winner predictability p, a linear model pro- 
duces an r^ = 0.885 {p < 0.001, F-test). Notably, fea- 
tures related to neutral environments and equally-skilled 
teams correlated with less predictable (more ideal) out- 
comes. As expected from the correlation between /3 and 
p (Table [S2)) . features that correlated with greater (3 typ- 
ically also correlate with lower p. 

Finally, we expected changes in policy to have an im- 
pact on scoring balance and tempo of events. However, 
we find that policy type features do not by themselves 
play a role in controlling these dynamics, once we control 
for other variables like skill, environmental structure and 
resources. Specifically, we find that the policy feature co- 
efficients are insignificant in all of our models {p > 0.1) 
and thus we excluded from the results of our best-subset 
selection. 
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FIG. S4. For the four dimensions discussed in the main text, (A, B, C, D) estimated distribution of scoring biases Pr(c), and 
(E, F, G, H) the AUG as a function of points remaining in the competition. 



Tests of model robustness 



To test the robustness of our results against spurious 
correlation, due to the high-dimensionality of our data, 
we conducted three additional analyses. 

First, we consider colinearity among the dependent 
variables. Table [S2] lists the pairwise coefficients of vari- 
ation r^ , showing a high degree of correlation between p 
and log/3, modest correlation between log/3 and Ao, but 
little else. To test whether these correlations impact our 
results, we conducted a MANOVA on a multiple mul- 
tivariate regression model (Table [S6)) . The results show 
that the same set of features reported in Table [S5] are sig- 
nificant, suggesting that our original results are robust. 

Second, we perform a stepwise AIC feature selection 
procedure to choose the best subset of features under 
mild regularization. With the exception of a, the results 
shown in Tables IS7| IS8| and [S9] indicate that the selected 
features and their weights presented in the original re- 
gression analysis are robust. The best-subset selection 
for a produces a larger list of significant features than in 
the original model, but a slightly lower r^ . The most sig- 
nificant negative feature, long range resources, is robust 
to this procedure while equally skilled teams and other 
resource features are not. 

Finally, we perform a randomization test by randomly 
permuting the dependent variables across the associated 
features and repeating the original multivariate regres- 
sion. This randomization destroys any natural correla- 
tion between the features and the dependent variable. 
Table [S3] shows the resulting coefficients of variation, 
none of which are statistically significant. These results 
further support the robustness of our original results. 



log /3 Ao 



log ,3 - 0.356 0.053 0.776 

Ao 0.356 - 0.003 0.398 

a 0.053 0.003 - 

p 0.776 0.398 - 



TABLE S2. GoefBcients of variation r for pairs of dependent 
variables. Gells containing no data are either irrelevant or 
statistically insignificant (p > 0.1). 



parameter r^ p-value 



log/3 0.08 0.98 

Ao 0.12 0.84 

a 0.12 0.8 

p 0.08 0.98 



TABLE S3. Regression results after randomly permuting the 
vectors of 35 independent variables and tuple of 5 scoring 
dynamics parameters, (log/3, Ao, Q,p). 



VIII. PLAYER PREFERENCE AND 
COMPETITION BALANCE 



When competitions are predictable they become less 
enjoyable. In professional sports this manifests itself as 
fans leaving a stadium well before the end of a game when 
one team is winning by such a large amount that there is 
little chance that the trailing team will make a comeback. 

In our model system, the same decision can occur for 
players themselves, who can effectively walk off the field 
by voluntarily exiting the competition early. For each of 
the competition types in our sample we calculated the 



competition dropout rate as 



1 ^ 
— — yj l{at least 1 player quits early}, (8) 



where N is the number of instances of the given type. 

From the first 25 million games, we extracted a total of 
4.1 million competitive type games that did not contain 
corrupt data. From these 4.1 million games we selected 
only those where at least one player left the game early. 
Using the remaining 1.9 million games we then tested 
for a correlation between the dropout rate lo and the 
overall balance /?. If players prefer more balanced com- 
petitions, as /3 increases (more ideal competitions), the 
dropout rate should decrease. A simple linear regression 
yields the equation Inw = 1.593 - 1.371 ln/3 (r^ = 0.43, 
■p ^ 0.001, i-test). These results corroborate our hypoth- 
esis, illustrating that the more predictable the scoring 



dynamics of a competition (small /3), the more likely at 
least one player will exit early. Quantitatively, this rela- 
tionship predicts that increasing competition balance /3 
by a factor of 1.66 correlates with reducing the early exit 
probability a; by a factor of 2. 

As a caveat, we note that there are several involun- 
tary reasons a player may exit early, e.g., network issues, 
power loss, system error, being "booted" for excessive 
friendly fire, and several voluntary reasons unrelated to 
player engagement, e.g., to join friends in another game, 
to change competition types, etc. Most of these variables 
are inaccessible to us for analysis; however, we cannot 
conceive of a mechanistic relationship between most of 
these reasons and the scoring balance of a competition. 
Additional investigation may further illuminate the pre- 
cise mechanism by which increase in /3 produce decreased 
exit rates. 
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feature 


code domain 


description 






loadout_l 


Rl 


{0,1} 


short range and medium range 






loadout_2 


R2 


{0,1} 


low quality resources 






loadout_3 


R3 


{0,1} 


long range and grenades 






loadout_4 


R4 


{0,1} 


short and long range 






loadout_5 


R5 


{0,1} 


medium range 




ro 


vehicles_revenant 


R6 


{0,1} 


lightly armored vehicle 






vehicles-scorpion 


R7 


{0,1} 


heavy tank vehicle 




§ 


vehicles_mongoose 


R8 


{0,1} 


unarmored vehicle 




^ 


vehicles_ghost 


R9 


{0,1} 


rapid attack vehicle 






weapons_short 


RIO 


{0,1} 


short range 






weapons_medium 


Rll 


{0,1} 


medium range 






weaponsjong 


R12 


{0,1} 


long range 






weapons_grenades 


R13 


{0,1} 


grenade type 






weapons_rocket 


R14 


{0,1} 


rocket launcher 






weapons_unsc 


R15 


{0,1} 


high-quality only resources 






weapons-covenent 


R16 


{0,1} 


low-quality only resources 






weapons_both 


R17 


{0,1} 


high- and low-quality resources 




_td 


TrueSkill matchmaking 


SI 


{0,1} 


equally skilled teams 




-^ 


team size 


S2 


{0,1} 


4- or 5-person teams 






map_open 


El 


{0,1} 


open terrain 






map_vertical 


E2 


{0,1} 


vertical environment 




map_circular 


E3 


{0,1} 


circular terrain 




CO 


map_varied 


E4 


{0,1} 


no clear organizing principle 




13 


map_corridors 


E5 


{0,1} 


indoor terrain 




s 

1 
o 


map_bases 


E6 


{0,1} 


defensible positions 




map_towers 


E7 


{0,1} 


high ground 






map.transporters 


E8 


{0,1} 


teleporters, jump pads and vents 




S 

0) 


map_outdoor 


E9 


{0,1} 


outdoor terrain 






map_size_small 


ElO 


{0,1} 


small or medium sized map 






map_size_large 


Ell 


{0,1} 


large arena 






map_size_perim 


E12 


R+ 


perimeter of map, seconds required to run in game 


s 


rules_noradar 


PI 


{0,1} 


HUD radar is off 




'5 
;3 


rules_noshields 


P2 


{0,1} 


shield is off 




O 
ft 


rules_headshot 


P3 


{0,1} 


headshot required for kill (SWAT rules) 






rules_snipers 


P4 


{0,1} 


sniper fight 





TABLE S4. Competition features, abbreviations and verbal descriptions, grouped in four categories: resources (R), skill (S), 
environmental structure (E), and policy (P). 



parameter 


feature 


B 


std. error 


t value 


Pr(> |tl) 


r^ 




E5 


1.849 


0.320 


5.764 


< 0.001 






El 


1.391 


0.371 


3.745 


< 0.001 






Ell 


1.123 


0.141 


7.920 


< 0.001 






SI 


0.822 


0.034 


23.828 


< 0.001 




log/3 


E3 
E9 


0.570 
0.481 


0.256 
0.076 


2.224 
6.265 


0.028 
< 0.001 


0.933 




RIO 


-0.354 


0.134 


-2.642 


0.009 






R8 


-0.495 


0.215 


-2.303 


0.023 






R15 


-0.580 


0.233 


-2.488 


0.014 






E6 


-0.813 


0.150 


-5.414 


< 0.001 






E2 


-1.861 


0.252 


-7.375 


< 0.001 






E7 


-2.126 


0.224 


-9.467 


< 0.001 






E5 


0.082 


0.008 


9.966 


< 0.001 






Ell 


0.059 


0.003 


16.344 


< 0.001 






El 


0.045 


0.009 


4.774 


< 0.001 






E3 


0.029 


0.006 


4.437 


< 0.001 






E9 


0.023 


0.001 


12.028 


< 0.001 






RIO 


0.008 


0.003 


2.478 


0.014 




At, 


SI 


0.005 


0.001 


6.010 


< 0.001 


0.955 


E4 


-0.009 


0.004 


-2.374 


0.019 




R8 


-0.011 


0.005 


-1.995 


0.048 






R13 


-0.011 


0.004 


-2.266 


0.025 






E6 


-0.011 


0.003 


-2.845 


0.005 






R2 


-0.015 


0.008 


-1.873 


0.063 






Rl 


-0.021 


0.008 


-2.680 


0.008 






R4 


-0.030 


0.008 


-3.797 


< 0.001 






R15 


-0.032 


0.006 


-5.444 


< 0.001 






E2 


-0.081 


0.006 


-12.448 


< 0.001 






E7 


-0.081 


0.005 


-13.991 


< 0.001 




a 


R12 
SI 


-1.9 X 10"^ 
-2.9 X lO"*^ 


8.1 X 10"" 
1.7 X 10"" 


-2.449 
-1.692 


0.016 
0.093 


0.652 




E7 


0.138 


0.022 


6.295 


< 0.001 






E2 


0.123 


0.024 


4.989 


< 0.001 






R4 


0.070 


0.030 


2.299 


0.023 






E6 


0.061 


0.014 


4.175 


< 0.001 






Rl 


0.053 


0.030 


1.734 


0.085 




P 


R15 


0.046 


0.022 


2.030 


0.044 


0.885 


R8 


0.040 


0.021 


1.937 


0.055 






E4 


0.031 


0.015 


2.018 


0.046 






R3 


0.029 


0.015 


1.852 


0.066 






R14 


-0.030 


0.012 


-2.366 


0.019 






E9 


-0.036 


0.007 


-4.775 


< 0.001 






SI 


-0.055 


0.003 


-16.413 


< 0.001 






Ell 


-0.089 


0.013 


-6.410 


< 0.001 






E5 


-0.095 


0.031 


-3.020 


0.003 





TABLE S5. Ordered multivariate regression model coefficients for all standard ("slayer" 
estimated generative model parameters log/3, Ao, a, and predictability measure p. 



competitions regressed onto the 
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feature 


df 


% 


fa 

i 


B 




Pr(> F) 


Rl 




0.533 


21.617 


4 


99 


< 0.001 


R2 




0.339 


48.147 


4 


99 


< 0.001 


R3 




0.352 


45.541 


4 


99 


< 0.001 


R4 




0.716 


9.802 


4 


99 


< 0.001 


R8 




0.167 


123.322 


4 


99 


< 0.001 


RIO 




0.302 


57.109 


4 


99 


< 0.001 


Rll 




0.418 


34.459 


4 


99 


< 0.001 


R12 




0.383 


39.799 


4 


99 


< 0.001 


R13 




0.817 


5.536 


4 


99 


< 0.001 


SI 




0.112 


194.402 


4 


99 


< 0.001 


R15 




0.224 


85.703 


4 


99 


< 0.001 


El 




0.455 


29.610 


4 


99 


< 0.001 


E2 




0.358 


44.342 


4 


99 


< 0.001 


E3 




0.606 


16.076 


4 


99 


< 0.001 


E4 




0.811 


5.742 


4 


99 


< 0.001 


E5 




0.246 


75.711 


4 


99 


< 0.001 


E6 




0.399 


37.133 


4 


99 


< 0.001 


E7 




0.842 


4.623 


4 


99 


0.001 


E9 




0.401 


36.896 


4 


99 


< 0.001 


Ell 




0.239 


78.378 


4 


99 


< 0.001 



TABLE S6. 
Table El 



MANOVA results of multiple multivariate regression model, providing a robustness check on the results given in 
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parameter 


feature 


e 


std. error 


t value 


Pr(> |t|) 


r^ 




E5 


1.803 


0.229 


7.867 


< 0.001 






El 


1.320 


0.228 


5.779 


< 0.001 






Ell 


1.126 


0.124 


9.029 


< 0.001 






SI 


0.822 


0.034 


24.153 


< 0.001 






E3 


0.480 


0.122 


3.919 


< 0.001 






E9 


0.479 


0.069 


6.888 


< 0.001 






R13 


0.154 


0.069 


2.243 


0.027 




log/? 


R14 


0.119 


0.074 


1.598 


0.113 


0.933 




Rl 


-0.322 


0.054 


-5.952 


< 0.001 






R3 


-0.232 


0.092 


-2.505 


0.013 






R12 


-0.310 


0.110 


-2.822 


0.005 






RIO 


-0.367 


0.113 


-3.232 


0.001 






R8 


-0.472 


0.181 


-2.596 


0.01 






R4 


-0.504 


0.062 


-8.081 


< 0.001 






R15 


-0.644 


0.092 


-6.931 


< 0.001 






E6 


-0.827 


0.130 


-6.353 


< 0.001 






E2 


-1.860 


0.207 


-8.957 


< 0.001 






E7 


-2.093 


0.193 


-10.840 


< 0.001 






E5 


0.084 


0.006 


13.770 


< 0.001 






Ell 


0.061 


0.002 


20.759 


< 0.001 






E3 


0.029 


0.003 


8.648 


< 0.001 






E9 


0.024 


0.001 


12.383 


< 0.001 






RIO 


0.008 


0.003 


2.794 


0.006 






R3 


0.005 


0.002 


2.080 


0.039 






SI 


0.005 


0.001 


6.085 


< 0.001 




Ao 


El 


0.048 


0.005 


8.880 


< 0.001 


0.954 




R13 


-0.009 


0.002 


-3.979 


< 0.001 






E4 


-0.008 


0.002 


-3.178 


0.001 






R8 


-0.011 


0.004 


-2.467 


0.015 






E6 


-0.012 


0.003 


-3.860 


< 0.001 






R2 


-0.015 


0.005 


-2.939 


0.004 






Rl 


-0.022 


0.005 


-4.191 


< 0.001 






R4 


-0.031 


0.005 


-5.852 


< 0.001 






R15 


-0.034 


0.004 


-8.469 


< 0.001 






E7 


-0.080 


0.004 


-16.695 


< 0.001 






E2 


-0.081 


0.005 


-14.457 


< 0.001 





TABLE S7. Ordered multivariate regression model coefficients for all standard ("slayer") competitions regressed onto log/J, 
Ao, selected via stepwise AIC, providing a second check on the robustness of the results in Table [S5l 
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parameter 


feature 


e 


std. error 


t value 


Pr(> \t\) 


r^ 




E7 


0.124 


0.010 


11.934 


< 0.001 






E2 


0.111 


0.011 


9.943 


< 0.001 






R4 


0.067 


0.010 


QAAA 


< 0.001 






E6 


0.052 


0.005 


8.998 


< 0.001 






Rl 


0.049 


0.010 


4.958 


< 0.001 






R8 


0.046 


0.016 


2.779 


0.006 




P 


R15 


0.045 


0.006 


7.335 


< 0.001 


0.882 




E4 


0.039 


0.007 


5.456 


< 0.001 






R2 


0.037 


0.010 


3.533 


< 0.001 






R3 


0.027 


0.008 


3.420 


< 0.001 






E9 


-0.034 


0.006 


-4.912 


< 0.001 






R14 


-0.036 


0.006 


-5.971 


< 0.001 






SI 


-0.055 


0.003 


-16.763 


< 0.001 






E5 


-0.076 


0.010 


-7.429 


< 0.001 






Ell 


-0.081 


0.006 


-12.389 


< 0.001 





TABLE S8. Ordered multivariate regression model coefficients for all standard ("slayer") competitions regressed onto p selected 
via stepwise AIC, providing a second check on the robustness of the results in Table [S5l 



parameter 


feature 


e (xlO"-'^) std. error (xlO^**) 


t value 


Pr(> |i|) 


r^ 




R3 


1.570 


2.583 


6.077 


< 0.001 






Rll 


1.446 


3.328 


4.345 


< 0.001 






R2 


1.432 


2.965 


4.832 


< 0.001 






E5 


1.105 


2.114 


5.226 


< 0.001 




a 


E3 


0.454 


2.368 


1.918 


0.057 


0.637 




SI 


-0.294 


1.689 


-1.746 


0.083 






Rl 


-0.470 


2.529 


-1.859 


0.065 






R15 


-1.591 


2.583 


-6.157 


< 0.001 






R8 


-1.868 


7.159 


-2.609 


0.010 






R12 


-2.551 


2.538 


-10.053 


< 0.001 





TABLE S9. Ordered multivariate regression model coefficients for all standard ("slayer") competitions regressed onto a selected 
via stepwise AIC, providing a second check on the robustness of the results in Table [S5l 



